System &amp; Method For Recommending Content Sources

ABSTRACT

A networked computer system identifies, optimizes and recommends content sources for users. The content sources can be used for providing news feeds, search results, etc. based on taking into net useful content contributed by such sources over other sources.

RELATED APPLICATION DATA

The present application is a continuation of U.S. application Ser. No.13/292,725, filed Nov. 9, 2011, which claims the benefit under 35 U.S.C.119(e) of the priority date of Provisional Application Ser. No.61/414,370 filed Nov. 16, 2010, both of which are hereby incorporated byreference. The application is further related to U.S. application Ser.No. 13/292,693, filed Nov. 9, 2011 (attorney docket no. JNG 2011-1),which is also incorporated by reference.

FIELD OF THE INVENTION

The present invention relates to systems and methods for determining,optimizing and recommending content sources for messaging, news andsearch engine systems accessible over electronic networks.

BACKGROUND

There was a time when the number of broadcast sources that an individualcould use for obtaining information could be counted on a short list.However, when the Internet was added as a vehicle for providinginformation, many more sources of information that might have existedpreviously but unavailable to a given user, such as an out-of-town localnewspaper, became readily available. In some sense that was stillmanageable in a way. For example, while the out-of-town local newspapersmoved their content online and might have sent it to individual users,that still was manageable, as any given user would only be interested ina few towns—such as the town someone grew up in—and that newspaper hadan editorial function that ensured that only articles of generalinterest were published.

Social networking and information broadcasting sites are now prevalentand widely accessed by Internet users. Significantly, a social networkcan involve communication emanating from millions of users and this is,in most cases, too much for any one reader to handle, absorb or use. Asa result, there is a need for mechanisms to control the deluge ofpossible information.

Sometimes, the information is filtered in a “request” manner, i.e.,while there might be billions of pages of information available to auser, the user does not have to deal with all of those pages because theuser selects a specific page, uses a search engine to identify aspecific page to request or other mechanisms are used so that the user'srequest is for a specific piece of information. For example, the usercan type in the URL (Uniform Resource Locator) from the user's list ofbookmarked URLs for the page specific to tomorrow's weather in theuser's local town or the page specific to events related to a specificcelebrity.

However, at other times, social networks and other tools provide theinformation to the user in the form of streams of messages. Examples ofsuch message streams include e-mail, instant messages, mobile phonecalls, SMS (Short Message Service) messages, and/or the like. Suchmessages are broadcast from a source to destinations or sent from onesource to one destination. In the general case, messages originate at asource and are received at a destination if that source and thatdestination are linked in a message graph.

In some cases, the source is an individual writing a message assumed tobe of interest to the destinations that are linked in the message graphto that source, but the source can also be a business entity, governmententity, organizational entity, and/or a computer entity (examples of thelatter being hardware and/or software running a program that determineswhat messages to send and when—often useful for automated alertstriggered by computer programming).

While not explicitly spelled out, there is an electronic component thatactually sends the message. For example, while it might be said that“celebrity movie star C.M.S. sent a message announcing her presence at afashion show” it is more typically that C.M.S. caused some electronicdevice, such as their smart phone, to generate a message they typed inand pressed “Send.” Thus, in typical parlance, saying that a person senta message typically implies that some electronic device generated themessage and sent it into a networked environment, e.g., one were serversknow that when a message is received from a particular source (orappearing to be received from a particular source), it is forwardedand/or replicated and forwarded to destinations according to a messagegraph. Likewise, at the destination side, there are users (who can beindividuals, entities and/or computer elements) that receive messages ondestination devices.

One such messaging service is operated by Twitter™, which offers aservice by which members can broadcast content in 140 character chunksknown as “Tweet” messages to anyone in the Twitter™ community.Individual members can choose which user feeds to subscribe to,resulting in a type of information stream that suits the tastes,interests/topics that the user is interested in. In the Twitter™ system,destinations are devices (cell phones, web browsers, Twitter™ apps,etc.) that receive Tweet™ messages and likewise the sources are devicesthat push Tweet™ messages into the Twitter™ system. The destinationand/or source devices can be cell phones with SMS capability, deviceswith web browser capability, devices that can run specialized Twitter™apps, or the like.

Twitter™ maintains a message graph mapping sources and destinations. Foreach edge in the Twitter™ message graph, the destination is said to be“following” the source (sometimes referred to as the “followee”). Inother words, if user A “follows” user B, then when user B posts a Tweet™message, it is provided to user A's list of Tweet™ messages. The graphis a directed graph, i.e., user A following user B does not necessarilyimply that user B follows user A. Twitter's™ message graph iscolloquially thought of as the lists of everyone's followers.

Another example is the message wall provided to users of Facebook™′ Yetanother example is comment boards that allow users to post messages andrespond to posted messages. Similar considerations are found inmulti-media content provider systems which attempt to introduce media(e.g., a new song or movie for example) to users based on consumptionhabits of other users in the community. Internal knowledge systems whichallow employees to enroll and receive selected emails from otherco-workers on particular topics are yet another.

Additional references in this area, which are incorporated by referenceherein, include:

United States Patent Application 20100299432 to Dotan—directed tomanaging user information streams.

United States Patent Application 20110029636 to Smyth which discloses areal time information feed system.

United States Patent Application 20110153646 to Hong which is a systemfor triaging information feeds.

United States Patent Application 20110252027 to Chen which is directedto recommending interesting content in an information stream.

United States Patent Application 20110093520 to Doyle whichautomatically identifies and summarizes content published by keyinfluencers.

A common problem in these kinds of information following systems ofcourse is the fact that users (particularly new users) are challenged toidentify appropriate content sources to follow for the topics they areinterested in. Twitter™ has addressed this problem, in part, bycreating/assembling their own “lists” of entities that they deem mostsuitable/appropriate for certain categories of content. For the mostpart, however, these lists tend to be dominated more based on thecelebrity status of the entity, and less so on the actual usefulinformation contributed by the entity in question. Twitter™ also letsusers make their own lists of people to follow, and one can review and“mine” the lists of others for leads as well. However, in the end, thisjust pushes the problem again to the end user to find and identifycontent of interest.

Generalized recommendation engines are known. For example, U.S. Pat. No.5,583,763 entitled “Method and Apparatus for Recommending SelectionsBased on Preferences in a Multi-User System” disclosed that musicpurchaser selections could be recommended to one user based on acommonality of prior purchases between that one user and other users,for example, recommending song S1 to user U1 because user U2 bought manysongs in common with user U1, but user U2 also bought song S1 and userU1 has not yet bought song S1.

Follower recommendation systems might perform a similar action withrespect to users and who each of them follows, but still there can be atendency for lists to become nothing more than popularity driven, inthat the same sources will appear all the time on every list withoutregard to their actual utility to the user/topic. In addition, earlyusers/adopters tend to be rewarded beyond their real value since theywill artificially appear in successive lists without regard to theircontributions.

The problem of designating which sources to follow will become even moreunmanageable as message services become more popular and users start to“follow” more and more publishers of content. At some level ofparticipation, the user's information stream (and overall experience)becomes degraded by the proliferation of duplicate content. Duplicatecontent threatens the utility of information streams. If content that isuseful and nonduplicative (i.e., information, rather than just bits andbytes of data) to a user is considered signal, and the duplicate,irrelevant and uninteresting (to that user) messages are considerednoise, a desirable goal is to raise the signal-to-noise ratio (“SNR”)—ofcourse with something better than requiring the user to manually readand delete the noise or read and scroll through the noise to get to thesignal.

Similar problems exist in other fields as well, including socialnetworking sites, internal emailing lists, etc. In fact, where thenumber of sources can be on the same order of the number ofdestinations, there can be a problem wherein there is a high percentageof duplicate content and even when sources are suggested, that canresult in a high percentage of duplicate content. It can be expectedthat in any data crawling/aggregation field (including for searchengines) the identification and selection of appropriate and optimalcontent sources is a prime concern. Given a finite amount of time andresources to characterize or identify relevant content for a topic, itis desirable to know to which sources are more likely to have relevantmaterial.

Clearly, there is a need for systems and methods to improve thesignal-to-noise ratio in such systems and existing approaches mightattempt to do so, but are not sufficient.

SUMMARY OF THE INVENTION

An object of the present invention, therefore, is to reduce and/orovercome the aforementioned limitations of the prior art.

In a message distribution system according to aspects of the presentinvention, users who receive messages from sources according to theirsource selections are provided with indications of relative utility ofadditional sources based on content of messages from sources alreadyselected by the users.

In specific embodiments, a message graph is maintained by a messagingserver or system and the message graph maps sources to destinations suchthat a message originated at a source is passed to destinations thathave a link (edge) in the message graph from the source to thatdestination. The links that terminate at a destination correspond to thesources that the user at that destination selected to follow. Where auser that has previously selected sources to follow and those previouslyselected sources have generated messages, the content of those messages,possibly also metadata of those messages, as well as other informationabout the selected sources, is used to provide recommendations to theuser seeking to select additional sources, to deselect existing sourcesor select a level of participation with a particular source. In somespecific embodiments, the recommendation is in the form of a rating on arating scale wherein the rating represents how well the new source islikely to improve and information-to-noise ratio for that user.

A search engine or other query/answer system can also benefit fromtechniques which optimize a data harvest by focusing on more relevantsources.

Other features of the invention will be apparent upon reading thepresent disclosure with reference to the figures and other elements ofthis application.

DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a content source recommendation process implementedin accordance with an exemplary embodiment of the present invention.

FIG. 2 illustrates a format of a typical message processed in accordancewith an exemplary embodiment of the present invention;

FIG. 3 depicts a typical table or database constructed as an informationunit/classification table in accordance with an exemplary embodiment ofthe present invention;

FIG. 4 depicts a typical table or database constructed as anentity/information unit table in accordance with an exemplary embodimentof the present invention;

FIG. 5 depicts a typical table or database constructed as anentity/information value table in accordance with an exemplaryembodiment of the present invention;

FIG. 6 depicts a preferred embodiment of a computing system employingand supporting the preferred processes described herein;

FIG. 7 depicts a format and substance of a typical entity informationoverlap correlation table that can be used in accordance with anexemplary embodiment of the present invention;

FIG. 8A—depicts a format and substance of a typical entity—contentcoverage graph/visualizer which can be used in accordance with anexemplary embodiment of the present invention;

FIG. 8B—depicts a format and substance of a typical entity—contentinformation gain/coverage graph/visualizer which can be used inaccordance with an exemplary embodiment of the present invention;

FIG. 8C—depicts a format and substance of a typical topic—contentinformation gain/coverage graph/visualizer which can be used inaccordance with an exemplary embodiment of the present invention;

FIG. 8D—depicts a format and substance of a typical entity recommendergraph/visualizer which can be used in accordance with an exemplaryembodiment of the present invention;

FIGS. 8E-8F depict embodiments similar to FIGS. 8A-8D in which visualfeedback is presented across all topics followed by the user (FIG. 8E)and additional incremental information gain obtained by adding new usersfor all topics (FIG. 8F);

FIG. 9 depicts a format and substance of a typical correlation tablethat can be used in accordance with an exemplary embodiment of thepresent invention.

DETAILED DESCRIPTION

As will be explained in more detail below, message distribution systemswith many sources create problems for users in selecting sources, wherethe user cannot feasibly follow all sources and many of the sources sendmessages without necessarily considering whether they are timely,duplicative, useful and/or informative to any particular audience. Toaddress this, example message distribution systems described below canmanage user selection of sources based indications of relative utilityof additional sources based on content of messages from sources alreadyselected by the users.

In the examples below, specific messaging systems might be described,but unless otherwise indicated, it should be apparent that in general amessaging system comprises some sort of electronics, such as aprocessor, program code executed by that processor, special-purposehardware, firmware, software, etc. that allows for the generation ofmessages and the propagation of messages. As used herein, a “message”originates with a “source” and is “passed” to a “destination” accordingto a “message graph” by a messaging server or service.

A “message” might refer to the unit of traffic of the messagingserver/service and could also refer to related metadata. For example, amessage might be a 140-character string (or 160-character string, orgeneralized message) reading “Thank you all for attending my concert.You were a great audience” sent at time T₁. Metadata for the messagemight include an imbedded URL (pointing to a website for the artist, ora concert page) and the time, T₁, of the message. In some systems, aunit message is referred to as a “Tweet” (the Twitter™ system), a“posting” (in a social networking system, a message board, a chat listor the like), or similar. For simplicity, the unit will be referred toin the present examples as a message. The actual technical nature of themessage might be a packet, an instant message, a chat message, a voicemessage, a block of text, or the like. The channels, as is well-known,may represent any number of electronic transmission venues forpresenting information to users, including the Internet, a cell phonebased link, a wireless link, and other well-known schemes.

As used herein, an “information unit” corresponds to a granularity atwhich the message distribution system or a source selection systemconsiders messages from the individual publishers/users (“sources”) tomeasure their contributions. For example, an information unit may be asbasic as a URL identifying a document (which could be a text page, awebpage, an image, a video, or some other form of content). In otherinstances, the information unit may correspond to a larger or higherlevel block of content in the message representing some statement aboutan event, such as “President Obama is now arriving at Camp David,” or“Red Sox 5, Angels 0” or the like. In still other instances, theinformation unit may represent a natural language engine classificationor interpretation of the content, such as in the latter case, aninterpretation that indicates “Red Sox are winning.” Other metadatacould be considered as part of the information unit as well. Thus, thoseskilled in the art will appreciate, after reading this disclosure, thatthe information units can be defined at a number of useful granularitiesand complexities depending on the needs and requirements of the messagedistribution system.

A source of a message is generally considered to be some sort of device,hardware or software. In some cases, the source is referred to as aparticular individual, business entity, government entity,organizational entity, and/or computer entity, or more precisely, thedevice, hardware and/or software operated at their behest. An example ofa computer entity that is a source is a location server that generatesmessages without human interaction based on location informationreceived by the location server to, for example, employ computing andsoftware facilities to auto-generate or push content related to thelocation of an individual based on a reporting function within aportable device.

Likewise, a destination of a message is generally considered to be somesort of device, hardware or software and might be referred to aparticular individual, business entity, government entity,organizational entity, and/or computer entity, or more precisely, thedevice, hardware and/or software operated at their behest. An example ofa computer entity is a client-side application that reads messages andperforms some actions based on the messages.

A messaging server or service is an electronic system that receivesmessages generated by sources and “passes” them to destinations that arescheduled (based on a message graph) to receive them. “Passing” canrefer to actually sending the messages, sending them in a broadcastfashion or individually, or merely making them available and/or visibleto the destination users according to the user interface they use.

The messaging server or service can maintain the message graph in manydifferent forms. In the simplest case, the message graph can berepresented by a (sometimes very large) graph comprising nodes (sources,destinations) and edges connecting those nodes. In some cases, the edgesare directed edges, such that the two nodes connected by the edge areconnected in one direction only (i.e., a source is a source for adestination, but that does not require a link back from that destinationto that source). Of course, two users (nodes) could choose to followeach other, in which case that might be stored as two edges, one in eachdirection, between those nodes. Herein, where the edges are notsymmetric, the source on an edge might be referred to as a “followee”and the destination on that edge is the “follower” to reflect that theuser of the destination device has chosen to “follow” the source, i.e.,receive messages that the source might send. The list of sources that adestination follows can be referred to as that destination user's“follow group.”

In the typical system, the destination user chooses which sources tofollow and from time to time makes changes to that user's “following”list, i.e., adds or deletes sources that the user follows. Of course,the message graph changes accordingly.

As explained in more detail below, where a user that has previouslyselected sources to follow and those previously selected sources havegenerated messages, the content of those messages, possibly alsometadata of those messages, as well as other information about theselected sources, is used to provide recommendations to the user seekingto select additional sources, to deselect existing sources or select alevel of participation with a particular source. In some specificembodiments, the recommendation is in the form of a rating on a ratingscale wherein the rating represents how well the new source is likely toimprove and information-to-noise ratio for that user. Again in someembodiments a particular domain knowledge engine (sports, science,stocks, etc.) or search engine may be similarly recommended to follow anew source which improves a quality, timeliness, etc., of informationcovered by the domain in question.

A very specific example is as follows. Suppose User D₁ follows UsersS_(A), S_(b), and S_(c). Someone suggests to User D₁ (or the user maydiscover on their own) that they should follow User S_(R) because UserS_(R) typically has early or insightful news of political events in aparticular region of the world. A source selection system might be partof the message distribution system or an entirely separate system. WhenUser D₁ indicates to the source selection system that User D₁ wants toadd User S_(R) to User D_(R)'s following list, the source selectionsystem might process/analyze (or have analyzed in advance) the contentof messages from Users S_(A), S_(b), and S_(c) and determine that mostof the messages from User S_(R) are delayed copies (or substantialcopies) of news already provided by User S_(b) and based on that, rateUser S_(R) low (specifically for User D₁) and possibly indicate thereason for the low rating. User D₁ then has more insight on the benefitor utility of adding User S_(R) and might then decline to add him/her.

In some systems, the decision to follow need not be binary (i.e., followor not follow), but can be to follow under certain threshold conditions,preferably automatically tested. For example, there are many Twitter™sources who post newsworthy, funny and/or interesting posts on importanttopics, but from time to time might want to talk about something oflimited interest. For example, one user might be a good source forinformation about various political events or entertainment events, butmight also post numerous messages about items of limited interest toothers (e.g., “protesters are gathering at the town square for OccupyOakland”, “Wow, C.M.S. is just strolling around the boardwalk, rightnow!”, “I am going to try to replace my harddrive”, “I got the screwsout OK. Wish me luck”, “Now, reinstalling drivers . . . ”, “Success.Shiny new hard drive running smoothly”, “the national news media justshowed up at the town square!”) In such cases, to keep up theinformation-to-noise ratio (assuming a user doesn't actually care abouton-going computer repairs), the destination user might select onlycertain levels of messages from a given source which exceed aconfigurable threshold. The levels might be a fixed number (e.g., A, Band C levels) or an unknown number (e.g., some integer where lowerintegers represent higher levels).

A level of a specific message might be set by the source manually, setat the source side automatically based on some programming parameters,set at the destination side automatically based on some programmingparameters and the content of the messages, etc. Thus, a source sendercould assign level “A” to important messages clearly of interest to thesender's followers, level “B” to questionable messages and level “C” tomessages clearly destined for only the most dedicated of followers.Alternatively, or in addition, the device or system used by thedestination user (or part of the messaging system) can automaticallyconsider the content of messages and based on that content or othermetadata, set the level of the message. Then, if the user has specifiedto follow that source, but only at a threshold level of “B” or above,level “C” messages would not appear for that destination user's messagelist, inbox or datastream.

In other specific embodiments, sources can be formed into logical groupsof sources, such that a destination can select a group of sources. Thismight be useful when a plurality of sources are known to collectivelycover a knowledge area with some degree of non-overlap, thus ensuring aminimum amount of effort and data required to follow a particular field.This is typically the case in social networks where multiple userscollectively follow each other and when they see that particularinformation is already out there, will not send a duplicative message,but will limit themselves to messages that add to what has already beenposted (and generally refrain from saying “me, too” without more).

As will be explained, the source selection system can apply rigoroustenets of information theory in compiling and presenting content.Embodiments of source selection systems can consider the unique/novelcontent contributed by each source, and how it compares to other sourcesalready available and/or used. In addition, the source selection systemcan also consider a timeliness of the source with respect to theinformation items, to further enhance the user's experience by virtue ofseeing developing information more quickly. By using these types offactors, a source selection system can maximize the amount of usefulinformation presented to users over time, and minimize the amount ofredundant or irrelevant information.

As further described herein, some considerations and parameters that canbe considered by some embodiments of a source selection system accordingto aspects of the present invention in the source selection process caninclude: (1) data related to a net total information offered by aparticular source or publisher on a topic, including entities withinone's follow group; (2) the benefit of information presented by userwithin a topic overall versus total cost of including that user's othernon-topic information within a feed; (3) the overall relevance of a userwithin a particular topic, as measured by the percentage of contentrelevant to a topic to a total body of content; (4) the ability topresent users with ratings of their own relative contributions/coveragewithin a topic, and examining the coverage offered by other contributorswithin a community, in effect providing sources with indications oftheir own ratings; (5) allowing users to “fill” their stream withcontent at a certain rate on a certain topic by dynamically adjustingthe number of followees that are included in a data feed (e.g., followA, B, C and D, but if there get to be more than M messages in a day,stop following D for the rest of the day); and/or (6) allowing users toidentify first tier and second tier designees who are used as fallbackswhen the data feed becomes slower/less busy with active content (e.g.,follow A, and B, but if there are less than M messages in a week, startfollowing C and D as well).

Illustrations of Example Systems

FIG. 1 illustrates an example of a content source recommender process100 that is adapted for presenting useful and valuable suggestions forcontent followers to a user, taking into account net information gain(NIG) offered by a particular user's message to both an overall generalaccumulated body of information used within the entire community, and ona more specific level, the advantage offered to another particular userinterested in a particular topic. As used herein, the term “recommender”is intended in its broadest sense to refer to an automated computingsystem that can consider users, content, etc., and develop correlationsbetween the two for the purpose of providing a suggestion orrecommendation to a user, and/or to an automated knowledge system, suchas a search engine, query system, etc.

Note that the information compiled concerning the NIG by users can beused by an operator of a broadcast system for the purpose of buildingout suggested lists of followers for new users. The latter is describedin more detail below as its applicability is customized for eachcombination of follower/followee pairings. As is apparent, since userswill typically not adopt or follow all content sources, thecommunity-wide suggested listings might or might not be appropriate fortheir particular content mix. In a similar manner, a search engineoperator can determine a NIG given by a particular source with respectto other information sources.

In general, since the content available for publishing within messagesystems is often limited (in the case of Twitter™, just 140 characters),a useful piece of information is the presence of URLs or the likelinking to other more content-rich documents. Accordingly, the presenceand identity of URLs can be detected and catalogued and thereafter usedas the information unit of interest. It should be noted that other typesof information within the message could be considered as well, includingtags, or hashtags as identified below.

As noted in FIG. 1, the user's (or system in the case of aknowledge/search engine) existing sources are identified at 105. In atypical Twitter™—like embodiment, this could be as simple as identifyingthe list of entities being followed by the user in question. In otherapplications, the identity of sources may be derived from examiningother data feeds used/adopted by the user, including social networkfriends, RSS feeds, news story selections (for example sections of anews aggregator such as Google™ News), interfaceselections/configurations such as used to configure a user's homepage/social network page and/or the like. Other examples will beapparent to those skilled in the art upon reading this disclosure.

The content, as noted above, can comprise individual messages that arethen identified and compiled at step 110 for the user's list offollowees. In a search engine or knowledge engine instance, the contentcould be derived from pages of websites, databases, etc.

In instances where this data is easily retrievable directly from anexisting centralized data store for the messages, the content can beextracted directly. In other instances such data may not be made readilyavailable, so the system can instead be bootstrapped over time to studyand compile a separate data store for each user which may be stored at amessage publishing site or a separate computing system accessible by theuser. Each message's author, content and timestamp are identified storedand indexed as well. The user's location and other metadata can also bestored as desired. The timestamp is used to indicate the time which themessage content was input by the source user.

In addition, at this time the source selection system can explicitly orimplicitly determine categories, topics, etc. of interest to the user.For example, the user's content might be mined and mapped automaticallyto distinct categories or tags (e.g., source selected tags that areincluded with a message; in the case of Twitter™, they are referred toas “hashtags” as they are included with the text of a message and setapart by a #hash mark; more generally, they are source-selected tags)without consultation.

Alternatively, the user can be presented with a set of topics (C₁, C₂, .. . , C_(k)) and asked to express a preference, interest or rating inone or more topic. The topics can range in breadth and scope to includesuch items as “Sports”, “Finance”, “Entertainment”, “World News”,“Technology”, “Local”, etc. Other examples might be used as well.

These correlations are compiled in a table 900 (see FIG. 9, as anexample) for reference in other operations of the system. The substanceand format of this table may be varied in accordance with systemrequirements and goals. For example, the users may be permitted tospecify binary values (e.g., “1” or “0”, “yes” or “no”) for therelevance/weighting of topics to be considered in evaluating the contentcontributions of information sources, or nonbinary values (e.g., aweight from 1 to 10 or some other convenient scale).

At step 115, the message content is analyzed to determine appropriateinformation units, which can include at least uniform resource locators(URLs). In other instances, the presence of user designated tags canalso be used as an information unit. Names of individuals, companies,brands, etc., can also be identified, as noted at 116. Other content canbe analyzed as noted above to identify appropriate topics or tags 117for the message. In this manner, the message is classified by thecomputing system into one of any number of predefined categories/topicswith any number of tags, which, again, can be varied in accordance withsystem and user needs and objectives. Other operations can also beperformed depending on system goals and requirements.

An example of a message is shown in FIG. 2, illustrating various piecesof information garnered and classified by the source selection systemfrom a message 200, including headers, identifying information, andinformation units such as URLs. In some cases, URLs may be abbreviatedand shortened (by services such as bit.ly and Twitter™ itself) so thatan additional decoding operation is necessary to identify a referencedweb document. Other forms of messages are useable as well.

FIG. 3 depicts a typical table or database that is constructed as aninformation unit/classification unit table 300 used to identify andclassify the various information units (IU₁, IU₂, . . . , IU_(max)) intovarious topics (C₁, C₂, . . . , C_(K)). The information units themselvescan be stored in any convenient form. For each corresponding informationunit, a relevance factor R may be employed to indicate a correlationbetween the information unit and a particular category.

As an example, an information unit IU₁ might be the phrase “Red Sox” andthe categories C₁, C₂ might be “Baseball” and “Sports”, respectively.The correlation factors R[1,1] and R[1,2] could be adjusted as desiredto indicate a relative weighting of these concepts and mappings to theparticular information unit. In this example, “Red Sox” may correlatehigher to “Baseball” than to “Sports”; the correlation factors can bebased on any convenient scale, such as ranging from 0 to 1, 1 to 10, 1to 100, or any other desired range. Note that negative correlationvalues may be useful in some embodiments.

Returning to FIG. 1, at step 120, an entity information unit table 400(FIG. 4) is compiled, to indicate, for each entity (E₁, . . . , E_(n)),an indicator of what specific information units (IU₁, IU₂, . . . ) eachhas contributed to the system, and preferably, an associated timestamp(T). In some instances, it may be desirable to assemble theentity/information tables logically by category. This information maytake other forms and include other data as needed for any particularapplication. At this point, an information contribution vector can becompiled and summed for each entity by computing the product of each ofthe information units (IU) (which can be simply “1”, but can varied toweight the value of content as well) contributed by the entitymultiplied by a timestamp value T[1,1]. The timestamp value can be basedon a universal date/time clock and have a concatenated form{yearmonthdayhoursecond} or any other well-known system of timemeasurement. The function used might be other than a multiplicativefunction or a nonlinear function.

The T factor can be varied according to system requirements based on avalue to be attributed to a timeliness factor. Thus, for example, aninformation unit that is first credited to an original contributor maybe associated with a value of 1, while all other contributions of thesame content after that point are scaled proportionately, or have someform of exponential decay. Thus, the T factors can be based on anyconvenient scale, such as ranging from 0 to 1, 1 to 10, 1 to 100, or anyother desired range. Note that negative timeliness values may be usefulin some embodiments.

Depending on system objectives, the T factors can be allocated asdiscrete values, so that all contributors, for example, providing acertain information unit within a block of time T_(b) are all given thesame T factor. This reflects the fact that in many cases a difference ofa few minutes may not matter much to the consumer of the content.Routine experimentation can be done to assess the change or delta intime ΔT which would nonetheless permit two different time values to beassociated with a common level or timeliness factor.

The net information contributed by the n-th entity, E, then can bedescribed by an information set as illustrated in Equation 1 identifyingevery unique information unit contributed by E:

E[n]={UI ₁ ,UI ₂ , . . . ,UI _(k), . . . }  (Eqn. 1)

If the timeliness of the information is also factored into the analysis,then a formula for the value of the information contributed by the nthentity E might be calculated by the system according to Equation 2 invector form as follows:

VE[n]=(UIC ₁ *T ₁ ,UIC ₂ *T ₂ , . . . ,UIC _(k) *T _(k) , . . . ,UIC_(max) *T _(max))  (Eqn. 2)

Where UIC_(k) refers to and reflects a value attributed to theparticular entity E[n] for contributing (or not contributing) the kthinformation unit UI. In this instance UIC_(k) is nominally equal tounity (1) when the entity has contributed UI_(k), to the datastream, andis zero otherwise for that information unit slot. The allocation of thedata units can be done easily in automated process by starting off withthe corpus of documents/data, identifying the information units, andcrediting them where appropriate to the network users.

Note that other types of factors, formulas, etc., can be used inalternate embodiments to yield an information value contributed by anentity, or the value of such information, particularly as concerns aparticular topic. While in some embodiments, all information units (UI)are given the same nominal value (1), some information (within orwithout regard to a topic) may be valued higher. In fact, for certaincontent identified or considered as spam, it is possible that negativevalues can be assigned to the information units as a mechanism forweeding out irrelevant or undesirable data in the user's stream.

Other entities, such as advertisers, may pay for organic “boosts” oftheir content to secure a higher information value score for their linkswithin a particular topic. This type of advertising can be used as acomplement to or in lieu of other traditional forms of in-streamadvertisement insertion.

In still other embodiments, a user's location information oridentification value might be used to permit them to adjust/boost thescores of other users/contributors who are closer to themgeographically. In addition a location identification value may beassociated with an information unit as well, so that a computation canbe done of a geographic relevance of an information unit to the user aswell. This has the effect of effectively biasing the user's experienceto a local flavor of interest.

Furthermore, in some embodiments, it may be desirable to calculate anadditional content score based on original content in the message thatgoes beyond the information unit itself. For example, a message thatcontains a link to an event may contain additional information,commentary or opinion about the event, which data can be ascribed anadditional originality score OC[n]. Metrics for assessing the relativenovelty of information are well-known in the art, and any convenientmechanism can be used with the present invention. Some individuals canbe rewarded, therefore, for providing additional (non-spam) commentary.This originality score can be added to the information value score (forexample 001+C1*T1) or can be used as an additional scalar (for exampleOC1*C1*T1) or in some other desirable manner to affect an overallinformation value score for the contributor. Other examples will beapparent to those skilled in the art upon reading this disclosure.

As further depicted in FIG. 4, to accommodate the fact thatcontributors' value may decline over time (due to lack of recentcontributions), a sliding data window (which can be programmed to have acertain predefined time window of a number of hours, days, weeks, etc.)may be employed during the calculation to determine the extent and valueof information contributed. Alternatively, the individual informationunits may themselves be associated with a time constant that introducessome form of controlled gradual or exponential decay in the value of theinformation. For example, the value of C1*T1 at time Tf is equal toC1*T1*e(−kTf) where k is a selectable constant. Again, other well-knownmeasures for introducing a decay value associated with staler contentcan be used as desired.

At step 125, the value of the contributions by each contributor,possibly on a topic or category basis, are sorted by a messagedistribution system into a list or table for later reference. This tablecan take any suitable form that allows for ease of manipulation andpresentation to interested users of the message distribution system.Independent lists can be made on a topic basis and presented to thecommunity of users so that new participants can in fact see the best andmost valuable contributors (from an information perspective).

As shown in FIG. 5 therefore, a table can be generated identifying aunique content contribution value score N associated with each of anumber of topics C for each entity E. This table in effect can be usedto identify authoritative sources for information in particular topics.Since the system cannot be gamed by simply repeating the same content,or by the popularity of the entity in question, it helps to provide alevel playing field for identifying true original and authoritativesources of useful information. Other forms of tabulation may be employedwithout deviating from the spirit of the present teachings.

An additional optional operation that can be performed at this time alsois a “uniquing” process by which some embodiments can classify anddifferentiate between contributors to identify persons who are mostalike in terms of their content contributions and the value of suchcontributions. In one simple approach, the system can simply identifywhich entities have contributed the same content, without regard to atime for such contribution. This “overlap” calculation can be done asshown in table 700 in FIG. 7, depicting an overlap score, S(m,n),between each pair of entities En, Em. This type of table can beconstructed on a category or topic basis to allow for finer granularityin identifying duplicate/unique contributors. When an information valueis considered instead, a similar table can be used, where each entryS(m,n) represents a mathematical calculation of the sum of the squareroot of the squared differences, a well-known formula for identifying acorrelation between two data values. Other techniques can be used foridentifying an amount of information value overlap.

Table 700 can be consulted—as described below—for each user's list toidentify entities within the user's existing list of followees who areeffectively redundant sources within the user's data stream. In otherwords, after determining that the net information gain from having bothsources within the user's stream is small, and given the unnecessaryduplication of content, it may be suggested or recommended to the userthat they eliminate one of the two entities. Particularly in fieldswhere the same content tends to be repeated by a large group ofindividuals, this technique has the advantage of eliminating largenumber of “parrot” contributors whose only contribution is in this form.In some cases, too, it is known that groups of individuals may act inconcert to spam or pollute a data stream with self-serving advertisingor other content. Use of embodiments of the present invention thereforecan result in a cleaner, less cluttered stream that is richer ininformation and reduced in “noise” (duplicate or unwanted content).

An additional related operation can be performed at step 130 to identifya relative “coverage” score for each contributor on a topic.Qualitatively what this calculation represents is a rough measure ofauthority that a particular entity has in a field by comparing them notonly to other entities, but also to the entire topic as a whole. Basedon this calculation, and other data gleaned from the uniquing operationsdiscussed above, the system can glean for a particular user the amountof coverage they create for a particular topic. This illustrated in avisual/numeric score as seen in FIG. 8A. The coverage score helps thesystem (and users) to identify which entities need to be added to auser's followee list to comprehensively cover a particular topic. Whileit can be expected that not all possible content for a particular topiccan be covered by a small number of entities, it is possible to allowthe user to fine tune either the coverage or number of followees toreach a desired goal.

By monitoring the adoption by users of content contributors the presentsystem can also easily identify and quantify the value of theinformation being contributed. That is, the information unit values asnoted above can have a non-binary utility value C which can be adjusteddynamically based on measuring what information is considered by thecommunity to have value. For example each information unit may beginwith a nominal value of 0.5, and this value is increased or decreasedbased on a consumption of the content in the IU, and/or an adoption ratefor users who provided such content item into the system. The range forthe values may be bounded by 0 and 1, again, or by some other convenientmechanism. In this way the value of information can be furtherquantified with reference to the consumers of the same who can influencethe behavior of the system.

The coverage for the user's topic (or topics) can be depicted innumerical, graphical or some other useful visual form to help illustratethe extent and any gaps in the user's datastream concerning the topic inquestion. The coverage might be depicted in a two-dimensional pie chartor square chart that identifies visually how much unique information theuser perceives, such as seen in FIG. 8A. An indication of the contentcontributed by individual entities {E1, E2, E3} followed by the user canalso be shown for reference as see in the overlapping circles. Othervariants/embodiments will of course be apparent to those skilled in theart upon reading this disclosure.

Returning to FIG. 1, at step 133, a user may be interested in adding newfollowees that they have seen or heard about. In this type of scenario,the user is interested in knowing what net value would be gained fromincluding such new followees (F1, F2 . . . ) in his/her data stream.Accordingly, at step 134 the system calculates net information gainprovided by the new followees, and presents the user with additionalinformation on the net coverage gained in a topic of interest, alongwith any information on the followee's level of duplication with anotherentity already on the user's list. FIG. 8B depicts an exemplaryembodiment of a visual presentation of this data. For a topic such as“NFL Football” for example, it indicates the new coverage based onadding F1, as well as other metrics such as the amount of duplicationresulting from adding F1, the amount of off-topic information, etc. Theinformation can be expressed visually, by reference to a certain numberof messages over time, or by some other convenient metric. As furtherseen in FIG. 8B, a visual diagram may show the contribution of F1 overthat offered by other entities on the user's existing lists, and theamount of off-topic information. After such being presented such data,the user can elect to accept or reject the proposed followee at step140.

As an alternate process, as noted above, the user may be informed atstep 130 that their content coverage within a particular topic (oracross all topics as needed) is at a certain amount, such as a percent,within a visual coverage chart. Consequently, as seen in FIG. 8C, theuser may indicate to the system that they want to increase theircoverage by a certain amount, or to incorporate certain entities 810that may be identified uniquely on the coverage chart as potential newsources of content (beyond the user's existing list) who are optimal interms of enhancing the user's coverage within the topic area. Thisinteraction with the coverage graph may be implemented using aconventional interactive slider tool within the user's browser.

Therefore, at step 135 (FIG. 1), the system again identifies these newcontributors, and automatically suggests/recommends them to the user.For each new followee the system can provide an indication of the score,percentage, etc. that is contributed by the recommendation over theuser's existing list. The system can account for and review eachfollowee to determine their degree of uniqueness again vis-à-vis theuser's other existing list members. In other words, the system avoidssuggesting duplicate content contributors when it gives suggestions onhow to increase the coverage. Thus, as between a first newcandidate/proposed followee with a 10% net new coverage and 50%duplication (over existing followees) and a second newcandidate/proposed followee with a 10% net increased coverage and 25%duplication, the system preferably recommends the latter at step 137. Asseen in FIG. 8D therefore, an output list of proposed newentities/followees to be added by the user can be presented.

This report by the system also contains other useful information againin the same manner as noted for FIG. 8B, including the resultingduplication, non-topic content, etc., in numeric and/or graphical form.Again, however, the user can accept or reject the candidate followees atstep 140 to make the final decision. The user therefore has substantialcontrol over the type of data filter that is superimposed over adatastream.

A further alternative is available in FIG. 1 through step 130 after theuser has determined their coverage for a topic. With this feature, theuser can simply request at step 136 that the system identify thesmallest set of followees required to reach some minimum threshold ofcoverage. For example, the system could indicate that the set {F1, F2 .. . Fn} would be required to achieve N % coverage, and so on. Theminimum sets can be constructed by the system in advance so that aminimum of duplication is presented to the user. The information forsuch type of feature could be presented within the same type of formatas shown for FIGS. 8A-8D at step 138. It is expected of course that thenumber of followees required to achieve a certain benchmark valuecoverage will vary significantly according to topic.

In other embodiments, the system can account for other non-topicinformation contributed by the candidate followee that may be unusefulor irrelevant to the user, and which effectively acts as noise as far atthat user is concerned. That is, the amount of on-topic information maybe small compared to the prospective followees total contentcontributions, and thus including them may result in the datastreambeing filled with additional noise. To accommodate this option, thesystem can again compute a relevancy factor for the prospective followeeas alluded to above. Thus, as between a first new candidate/proposedfollowee with a 10% net new coverage and 50% non-topic information (overexisting followees) and a second new candidate/proposed followee with a10% net increased coverage and 25% non-topic information, the systempreferably recommends the latter. Again, however, the user can accept orreject the candidate followees at step 140 to make the final decision.

In any case, the system gives information to the user (eithernumerically or visually) of the net information gain presented by thenew proposed followees within the topic. The amount of non-topicinformation can also be provided to help the user make an intelligentdecision in the same manner as illustrated previously for the otheralternatives shown in FIGS. 8A-8D.

Embodiments of the invention therefore allow users to more carefullycraft lists of entities to follow within a data feed stream, maximizingthe amount of useful information while removing redundancy, off-topicdata, and other noise. The basic calculations done by such a sourceselection system to determine the information gained, noisecontributions, off-topic contributions and similar calculations, etc.,can be done off-line, periodically, and/or dynamically as may bepossible with available computing resources. Due to the amount ofcontent considered by the system, the recommendations for new followeesmight not be expected to vary significantly from day to day, andtherefore the updates can be done at that frequency.

It will be apparent to those skilled in the art upon reading thisdisclosure that the system can also accommodate optimization of followeelists across multiple topic categories. In other words, if the userspecifies (FIG. 1, step 105) that he/she follows categories/topics C1,C2 . . . Ck (FIG. 9) as compiled in table 900, the system can calculatethe content contributions and content valuations across these multipletopics using the same methodology as noted above. One difference, ofcourse, is that the computed vectors are now summed across the multiplecategories for each contributor. Similarly, the uniquing and duplicationcalculations would be done across multiple topics for a comprehensiveevaluation of the user's followee list.

Accordingly the inventive processes of the present embodiments can beused to identify an existing coverage across all topics followed by theuser (FIG. 8E) and indicate an additional incremental information gainobtained by adding a new user E1 as seen in FIG. 8F. Similar featuressuch as illustrated in FIGS. 8C and 8D would be implemented in the samemanner as described above for the single topic case.

One notable advantage of multi-topic information gain considerations isthat the system can effectively consolidate and identify duplicationbetween entities to reduce an overall user followee list, and thus theamount of information in the stream required to cover a set of topics.For example, a particular candidate content contributing entity Ec maycontribute across multiple topics, such as to effectively duplicate thecoverage provided by two other separate entities {E1, E2} reporting onthe same two topics separately. If Ec contributes less noise in othertopics within the feedstream, the system can recommend such entityinstead and again increase the overall signal/noise ratio of thedatastream. It is expected that contributing entities will advertise andexploit their net information gain ratings as a promotional tool, and,for this reason, also be more likely to hone and retain such ranking tomaintain their attention and loyalty with a set of followers.

Other additions to, modifications and variants of this approach tooptimize the user's data feed will be apparent to those skilled in theart from the present teachings.

In other embodiments, the user may have the option of weighing thetopics as well, as noted in FIG. 9. For example, the user may be given arange of values to indicate a desired weight (again, using anyconvenient scale, such as 0 to 1, 1 to 10, 1 to 100, etc.).Alternatively, these topic weightings might be automatically derived bythe system from observing the user's behavior, including the posting andreviewing of content related to the topic. As illustrated in FIG. 9, thetopic Ck may be weighted only by 0.8 compared to 1 for C1 for entity E1,and so on. In some instances, for ease of calculation, the weightingsacross all topics might be normalized to sum to some unit value.

In other embodiments, the new followees can be implemented using atemporary, probationary status within the user's datastream. This statusmay be altered directly by the user, or by default over some period oftime unless altered by the user. This allows the user to go back to aprior state of his/her datastream as desired.

In yet other embodiments, processes can be implemented in hardware orsoftware to treat certain content differently depending on variouscontent header field values. For example, a content header may indicateexplicitly that the content is a duplicate of some other piece ofcontent already contributed by another member, who is also identified inthe header. This is known in the Twitter™ system as “re-Tweeting” and insome cases, the system may not want to penalize a content contributorfor this type of content duplication.

In other cases, some feeds automatically include messages from unrelatedentities that are not on a followee list simply because an entity thatis white listed on the user's followee list reproduces a message fromthat unrelated entity. This is frequently done by an entity that isseeking to populate the datastream with content from others thatmentions the entity—i.e., a form of self promotion. As it currently isimplemented, to prevent this, Twitter™ requires users to manuallydisable this type of feature individually from every entity contributorwho happens to re-tweet content, which can be incredibly cumbersome toaccomplish.

The source selection system can be used to identify such duplicates toautomatically detect the duplicate content and suppress/prevent themfrom flooding a user's datastream. It will be apparent that other typesof content can also be classified in different ways to enable the systemto ignore or treat it differently than through the conventionalcalculations noted above.

Another benefit of the novel source selection system is that the valueof advertising presented within a datastream is increased, since thedatastream is no longer contaminated and overwhelmed with redundantinformation that results in important content being buried. The ratio ofadvertising messages to total messages is increased by removing lessuseful/noise messages, thus improving the chances of their beingmeaningfully absorbed and processed by a user.

As is also known, human beings have a natural saturation point beyondwhich they can handle relationships with new entities and thus processnew data from the latter. This is sometimes referred to as “Dunbar'snumber” and is estimated to be between 100 and 250, with referenceliterature suggesting it is probably around 150. From even basicexaminations of online sites, such as Twitter™ and Facebook™, it isapparent already that many users are far past this point, and cannotpossibly absorb or understand the data presented by their followees. Thesource selection system therefore allows for users to “follow” largernumber of entities in the sense that the overlap in contentcontributions can be ignored in most instances. Thus the user can, as apractical matter, remove followees without any penalty because theircontributions are already embodied in another feed. In this model, theuser is effectively following a much larger number of people than arereflected in their optimized followee list.

In addition, since the users are now receiving a higher percentage ofrelevant information germane to their topics they are less likely tomiss important details, developments, events etc. This further increasesthe user experience relative to other systems which simply flood theuser with every message related to a topic.

Still other embodiments could be used in social networking sites thatoffer a comparable newsfeed or newstream for their members, such as thatoffered by Facebook™. For example, a member's social network friends (orother sources in the social network) could be considered for theircontent contributions to a news feed. Using the processes describedabove, the message distribution system could determine the optimal mixof friends (or other sources) to be used as updates to comprehensivelycover a category. In effect, the user's social network could be used ina similar manner as the followee list discussed above. Alternatively theinvention could be used in social networking architectures that allow auser to unilaterally whitelist or create a connection even to unrelatedsources (or members) for purposes of following that source.

In still other embodiments, a website that performs news aggregation(such as done by Yahoo!, Google, MSN, etc.) could employ thesetechniques to identify unique and valuable contributors of content totheir site. Frequently, users of such sites customize their news storiesbased on particular news topics of interest. In this instance, the newsaggregators' news sources could be used and evaluated in a similarmanner as the followee list discussed above. The output of a recommendersystem in this instance would identify other potential useful sources ofinformation to augment or replace existing content sources. Since mostnews aggregators rely on locality or popularity to link in/presentstories, the system could be used to complement or replace these priortechniques.

In yet another embodiment the invention could be used to adjust emailaddress lists for entities that permit employees to subscribe tospecific topic/knowledge threads. In other words, an employee may havean interest in a particular project or topic within his/her company. Bystudying the content contained in emails of the organization, a systememploying the present teachings could again create customized andoptimized lists by topic to permit individuals to follow email threadsconcerning the topic. Assuming the information is otherwise designatedby the email authors as shareable with the individuals seeking access,the system could effectively “whitelist” an individual by automaticallydesignating them to be cc:d or bcc:d on the desired internal messages.This approach again has the advantage of allowing individuals to morerapidly identify useful sources of information within an enterprise.

A preferred embodiment of a computing system 600 employing andsupporting the aforementioned preferred processes depicted in FIGS. 1-5and 7-9 is shown in FIG. 6. As seen herein, a server computing system610 is preferably a collection of computing machines and accompanyingsoftware modules of any suitable form known in the art for performingthe operations described above and others associated with typicalwebsite support. The software modules described below (referencedusually in the form of a functional engine) can be implemented using anyone of many known programming languages suitable for creatingapplications that can run on client systems, and large scale computingsystems, including servers connected to a network (such as theInternet). Such applications can be embodied in tangible, machinereadable form for causing a computing system to execute appropriateoperations in accordance with the present teachings. The details of thespecific implementation of the present invention will vary depending onthe programming language(s) used to embody the above principles, and arenot essential to an understanding of the present invention.

As seen in FIG. 6, a number of users N 605 access the computing systemover a network (such as the Internet) to obtain a customized datastreamfeed of the type described above. To simplify the description, only twousers are shown but it should be understood that at any moment in timethe number of users can be thousands or millions depending on availablecomputing resources. As seen in FIG. 6, a first user 1 contributescontent over the network to a content Intake Engine 625 which, as notedabove, may process the data in accordance with the discussion for FIG. 1noted above to effectuate the operations associated with steps 110/115to analyze and format the messages shown in FIG. 2.

The resulting information from the Intake Engine 625 identifying andcorrelating topics, users, followers, followee lists, content,timestamps, weightings and tables described above are stored in anynumber of conventional databases 630 which may be relational databasesto optimize speed, data compactness, etc. A ContentExtraction/Classification Engine 640 further cooperates with IntakeEngine 625 and Databases 630 to analyze the stored and incoming messagesto identify topics, information units, contributors, etc., as discussedabove for FIG. 1 (steps 110/115) and the information shown at 116/117.This engine also is responsible for the operations noted for step 120(FIG. 1).

Returning to FIG. 6, an Information Contribution Engine 650 isresponsible for performing the information contribution and valuecalculations noted above for reference number 120 (FIG. 1). AnInformation Contributor Engine 655 allocates credits to the individualentities in the manner described above again with reference to steps 120to 125. A Coverage Engine 653 is responsible for performing generallythe steps noted above as 130 etc. namely, determining the respectivecoverage by different contributors. This engine cooperates with anAdjustment Engine 654, which effectuates the various coverage adjustmentfeatures noted above in reference numerals 133, 134, 135, 136, 137, 138for the users. The output of this engine is used by a FolloweeRecommendation Engine 660, which then provides the reports andsuggestions (see FIGS. 8A-8F) within a suitable graphical interface tothe user 605 to allow the latter to accept or reject the proposedadditions/deletions of followees.

An Advertising Engine 670 is also employed to feed ads within adatastream in accordance with desired objectives of the datastreamprovider and an advertiser providing the ad stock. As noted above, sincethe signal/noise of content can be raised with embodiments of thepresent invention, it is expected that the perceptibility and utility ofadvertising should be higher as it is more likely to be seen.

A Feed Rate Engine 680 operates to carry out another unique operationthat can be implemented in some embodiments, namely, a governor orthrottling function. This feature allows a user to specify a number ofmessages or content that they wish to peruse within a datastream withina certain period of time for one or more topics. For example, a user mayspecify that they want to see only a certain number N of messages perminute, per hour, per day, etc. In some instances they can furtherspecify the overall size of the datastream (number of messages) that canbe seen at any moment in time. Since the system and process areextremely effective at identifying useful content, this feature can beused to fill a user's datastream at some desired consumption rate. Ineffect, in some embodiments, the user could designate both a primary andsecondary set of followees, the latter of which are only accessed (basedon some priority, or based on a random selection) and presented on anas-needed basis, such as when the user's datafeed is otherwise below apredetermined (user adjustable) threshold. In this manner someembodiments of the invention can implement a form of flexible followeelists that respond to a desired interest in a particular topic.

The consumption rate can be adjusted as a number of messages per timeperiod, which can be hours, minutes, etc., depending on the user's goalsand system capabilities. Other metrics could be used to allow the userto specify a desired fill-rate and a trigger mechanism to determine whento access the secondary content sources.

The user's selected content, as determined by his/heroptimized/customized followee list, is then presented through a DataFeed Engine 620, which can be of the same type known in the art forpresenting a conventional datastream to the user. A number of techniquesfor presenting the user's preferred data within a graphical interface(including a mobile interface in some instances) can be used for thispurpose. Note that since it is commonly the case that some individualentities will themselves duplicate content (by repeating it over someperiod of time within a broadcast/feed), the Data Feed Engine can befurther programmed to identify additional duplication that way andeliminate it from the feed.

By preventing entities from flooding the feed with duplicate content,the signal/noise ratio can be improved even further. Thus, the systemcan check any new information units presented in messages against priormessages presented in the user's data feed. If a duplicate is detected,it can be selectively blocked depending on the user's setting for suchbehavior. In some instances, the Data Feed Engine can highlight certainmessages visually with some form of enhancement (bolding, flashing forexample) to indicate that they are duplicates of prior content andtherefore may be more important. Other mechanisms for indicating theprevalence of a particular information unit can be used, such asproviding a numerical indicator alongside or superimposed over themessage to indicate a number of the community and/or the user's followeelist who have broadcast/published the item. It will be apparent thatembodiments of the invention can be used to identify and minimize “spam”as well using the present teachings.

Other modules may be advantageously employed or that are necessary foroperation of a website to support the above processes might be includedas well, but need not be described in detail her, for clarity, but couldbe implemented as desired. In addition, it is to be understood thatthese are merely examples, and other applications that require humanconsumption of content are clearly potential beneficiaries of theaforementioned techniques. Using the systems and/or methods describedherein, an additional beneficial effect is provided in that it willintroduce users more quickly to sources of useful info faster isotherwise available, and increase the amount of serendipitousdiscoveries of new sources. Since conventional recommender systemseventually tend/trend over time to predominantly favor popular items,they can be less effective in terms of identifying new interestingsources of information. In some embodiments, the system can be adjusted(when all other information gain factors are otherwise equal) to presentthe new followees randomly to eliminate any follower “bias” effect thatmay creep in over time and which can distort the true value of thecontributions of users.

One advantage is an intelligent tool to prune a social/informationsystem so as to eliminate unnecessary connections that do not contributeanything useful to the user's experience or understanding. It isexpected that news organizations can benefit from the present teachingsas they could effectively create comprehensive coverage of topicsthrough content aggregation from multiple sources that can be singlesourced instead. In other words, a broadcasting entity could use thetechniques to identify an optimized minimum member set of keycontributors within a particular topic space required to achieve acertain percentage of coverage. Then, the entity could track and imitatethe content of such set to effectively create a logical data feed(effectively a type of proxy group) that emulates an optimizedinformation channel for the topic in question. Individual newsbroadcasters could then compete for end user attention/followers on atopic by topic basis based on their overall signal/noise ratio andcoverage rate for the topic in question.

As seen in box 658 (FIG. 6), the top contributors by topic can bepublished to the end user community for general interest and for theiruse in selecting optimal followees as well. The top contributingentities can also be specially identified or designated within thedatafeed by the system to recognize their status as authoritative forthe topic in question based on a net information/information valueconsideration.

From an advertiser's perspective, advertising can be targeted tomessages based on an entity's information/information value contributionwithin one or more topics, rather than simply being based on a number oflists that they appear on. As it is expected that useful contributingentities will be more heavily followed eventually, an advertiser canbenefit from the ensuing adoption of such entities to carry theircontent alongside any information units.

It should be noted at that this time new trends in personalizationemphasize the importance of a user's entire social graph content overthat generated by automated algorithms. However, as explained above,this is not sufficient since even such content cannot be efficientlymanaged with current tools. Embodiments of the present invention offer amore desirable datastream for users, since they can combine the bestaspects of both human (social graph) and automated curation which helpsrefine and select the social graph contributions.

It will be understood by those skilled in the art that the above aremerely examples and that countless variations on the above can beimplemented in accordance with the present teachings. A number of otherconventional structures/steps that would be included in a commercialapplication have been omitted, as well, to better emphasize the presentteachings.

It will be further apparent to those skilled in the art that the modulesof the present invention, including those illustrated in the figures canbe implemented using any one of many known programming languagessuitable for creating applications that can run on large scale computingsystems, including servers connected to a network (such as theInternet). The details of the specific implementation of the presentinvention will vary depending on the programming language(s) used toembody the above principles, and are not material to an understanding ofthe present invention. Furthermore, in some instances, a portion of thehardware and software will be contained locally to a member's computingsystem, which can include a portable machine or a computing machine atthe users premises, such as a personal computer, a PDA, digital videorecorder, receiver, etc.

Furthermore it will be apparent to those skilled in the art that this isnot the entire set of software modules that can be used, or anexhaustive list of all operations executed by such modules. It isexpected, in fact, that other features will be added by system operatorsin accordance with customer preferences and/or system performancerequirements. Furthermore, while not explicitly shown or describedherein, the details of the various software routines, executable code,etc., required to effectuate the functionality discussed above in suchmodules are not material to the present invention, and may beimplemented in any number of ways known to those skilled in the art.Such code, routines, etc. may be stored in any number of forms oftangible machine readable media which are not merely carrier waves or asignal modulated by a carrier over a transmission medium. The abovedescriptions are intended as merely illustrative embodiments of theproposed inventions. It is understood that the protection afforded thepresent invention also comprehends and extends to embodiments differentfrom those above, but which fall within the scope of the present claims.

What is claimed is: 1.-18. (canceled)
 19. A method of recommendingentities to a user for inclusion as content contributors for a userdatastream using a networked computing system comprising: a.automatically identifying with the networked computing system aninformation contribution provided within a first user datastream by eachof a base set of content contributing entities (E1, E2 . . . Em) for afirst topic; wherein a collective information contribution is computedfor (E1, E2 . . . Em) for said first topic based on measuring individualinformation units from (E1, E2 . . . Em) mapped by the networkedcomputing system to said first topic; b. for a first candidatecontributing entity (En) who is not part of the first user's datastream,automatically identifying with the networked computing system aninformation contribution for said first topic to the user datastream;wherein a first individual information contribution is computed for (En)for said first topic based on measuring individual information unitsfrom (En) mapped by the networked computing system to said first topic;c. automatically comparing with the networked computing system saidcollective information contribution and said first individualinformation contribution for said first topic; wherein said comparingincludes at least a first computation determining a first netinformation gain achieved over said collective information contributionby including said first individual information contribution for saidfirst topic; and d. automatically recommending to said user with thenetworked computing system a recommendation that said first candidatecontributing entity (En) be included for contributing content to theuser datastream when said first net information gain exceeds a targetthreshold.
 20. The method of claim 19, further including a step:controlling the user datastream to automatically include content fromsaid first candidate contributing entity (En) in response to the useraccepting said recommendation.
 21. The method of claim 19, furtherincluding a step: obtaining a target coverage value specified by theuser for said first topic, and using said target coverage value todetermine an optimal set of content contributing entities required toachieve said coverage value for said first topic within the datastream.22. The method of claim 21, wherein said optimal set of contentcontributing entities includes the smallest number of entities requiredto achieve said coverage value for said first topic within thedatastream.
 23. The method of claim 21, wherein said optimal set ofcontent contributing entities includes the smallest number of entitiesrequired both to achieve said coverage value for said first topic withinthe datastream and maintain a duplication rate below a user selectablevalue.
 24. The method of claim 20, wherein information duplication isminimized for said first topic by filtering messages that include thesame content as prior messages shown within the datastream.
 25. Themethod of claim 19, wherein the individual information units includeuniform resource locator links within electronic messages.
 26. Themethod of claim 19, wherein steps (b) and (c) are performed over apredetermined time window.
 27. The method of claim 26, wherein saidpredetermined time window can be selected by the user.
 28. The method ofclaim 19, further including a step: generating a visual display outputfor the user, indicating graphically a predicted datastream message feedrate or predicted change in datastream message feed rate achieved basedon adding or removing said first candidate contributing entity (En). 29.The method of claim 19, further including a step: generating a visualdisplay output for the user, indicating graphically a value of a dataduplication rate present within the user's datastream as calculated bythe networked computing system.
 30. The method of claim 19, whereinindividual ones of said base set of content contributing entities aredetermined automatically for the user in response to the user selectingsaid first topic for the user datastream.
 31. The method of claim 19,wherein individual ones of said base set of content contributingentities are determined by the user as part of a whitelist.
 32. A methodof recommending entities to a user for inclusion as content contributorsfor a user datastream using a networked computing system comprising: a.automatically identifying with the networked computing system aninformation contribution provided within a first user datastream by eachof a base set of content contributing entities (E1, E2 . . . Em) for afirst topic wherein a collective information contribution is computedfor (E1, E2 . . . Em) for said first topic based on measuring individualinformation units from (E1, E2 . . . Em) mapped by the networkedcomputing system to said first topic; b. for a first candidatecontributing entity (En) who is not part of the first user's datastream,automatically identifying with the networked computing system aninformation contribution for said first topic to the user datastream;wherein a first individual information contribution is computed for (En)for said first topic based on measuring individual information unitsfrom (En) mapped by the networked computing system to said first topic;c. automatically comparing with the networked computing system saidcollective information contribution and said first individualinformation contribution for said first topic; wherein said comparingincludes at least a first computation determining a first netinformation gain achieved over said collective information contributionby including said first individual information contribution for saidfirst topic; d. repeating steps (b) and (c) for a second candidatecontributing entity (Ep) to determine a second net information gainachieved over said collective information contribution by including asecond individual information contribution for said first topic from(Ep); e. measuring an overlap of individual information units betweensaid first individual information contribution for said first topic from(Ep) and said second individual information contribution for said firsttopic from (Ep); f. comparing said first net information gain and saidsecond net information gain; and g. automatically recommending one orboth of said first candidate contributing entity and said secondcandidate contributing entity to said user with the networked computingsystem based on results of step (e) and step (f).
 33. The method ofclaim 32, wherein said recommending step is also based on measuringsecond content outside said first topic contributed by said firstcandidate contributing entity and said second candidate contributingentity, which second content reduces a recommendation score for suchentities.